Although weakly-supervised techniques can reduce the labeling effort, it is unclear whether a saliency model trained with weakly-supervised data (e.g., point annotation) can achieve the equivalent performance of its fully-supervised version. This paper attempts to answer this unexplored question by proving a hypothesis: there is a point-labeled dataset where saliency models trained on it can achieve equivalent performance when trained on the densely annotated dataset. To prove this conjecture, we proposed a novel yet effective adversarial trajectory-ensemble active learning (ATAL). Our contributions are three-fold: 1) Our proposed adversarial attack triggering uncertainty can conquer the overconfidence of existing active learning methods and accurately locate these uncertain pixels. {2)} Our proposed trajectory-ensemble uncertainty estimation method maintains the advantages of the ensemble networks while significantly reducing the computational cost. {3)} Our proposed relationship-aware diversity sampling algorithm can conquer oversampling while boosting performance. Experimental results show that our ATAL can find such a point-labeled dataset, where a saliency model trained on it obtained $97\%$ -- $99\%$ performance of its fully-supervised version with only ten annotated points per image.
translated by 谷歌翻译
伪装的对象检测(COD),将其优雅地融合到周围环境中的对象是一项有价值但充满挑战的任务。现有的深度学习方法通常陷入具有完整和精细的对象结构准确识别伪装对象的困难。为此,在本文中,我们提出了一个新颖的边界引导网络(BGNET),以用于伪装对象检测。我们的方法探索了有价值的和额外的对象相关的边缘语义,以指导COD的表示形式学习,这迫使模型生成突出对象结构的特征,从而促进了精确边界定位的伪装对象检测。对三个具有挑战性的基准数据集进行的广泛实验表明,我们的BGNET在四个广泛使用的评估指标下的现有18种最新方法明显优于现有的18种最新方法。我们的代码可在以下网址公开获取:https://github.com/thograce/bgnet。
translated by 谷歌翻译
现有的最先进的(SOTA)视频显着对象检测(VSOD)模型已广泛遵循短期方法,该方法通过仅考虑当前连续的有限帧而动态地确定空间和时间显着性融合之间的平衡。但是,短期方法论具有一个关键限制,这与我们视觉系统的真实机制相抵触,这是一种典型的长期方法。结果,故障案例不断出现在当前的SOTA模型的结果中,而短期方法论成为主要的技术瓶颈。为了解决这个问题,本文提出了一种新颖的VSOD方法,该方法以完整的长期方式执行了VSOD。我们的方法将顺序vSOD(一个顺序任务)转换为数据挖掘问题,即将输入视频序列分解为对象提案,然后尽可能易于挖掘出明显的对象建议。由于所有对象提案都可以同时获得,因此提出的方法是一种完整的长期方法,可以减轻植根于常规短期方法的一些困难。此外,我们设计了一个在线更新方案,该方案可以掌握显着对象的最具代表性和可信赖的模式概况,并使用丰富的细节输出框架显着图,并在空间和时间上平滑。所提出的方法在五个广泛使用的基准数据集上几乎优于所有SOTA模型。
translated by 谷歌翻译
在本文中,我们提出了一种用于HSI去噪的强大主成分分析的新型非耦合方法,其侧重于分别同时为低级和稀疏组分的等级和列方向稀疏性产生更准确的近似。特别是,新方法采用日志确定级别近似和新颖的$ \ ell_ {2,\ log} $常规,以便分别限制组件矩阵的本地低级或列明智地稀疏属性。对于$ \ ell_ {2,\ log} $ - 正常化的收缩问题,我们开发了一个高效的封闭式解决方案,该解决方案名为$ \ ell_ {2,\ log} $ - 收缩运算符。新的正则化和相应的操作员通常可以用于需要列明显稀疏性的其他问题。此外,我们在基于日志的非凸rpca模型中强加了空间光谱总变化正则化,这增强了从恢复的HSI中的空间和光谱视图中的全局转换平滑度和光谱一致性。关于模拟和实际HSIS的广泛实验证明了所提出的方法在去噪HSIS中的有效性。
translated by 谷歌翻译
由于深入学习技术的快速进展和大型培训集的广泛可用性,视频显着性检测模型的性能一直在稳定地改善。然而,基于深度学习的VisualAudio固定预测仍处于起步阶段。目前,只提供了一些视觉音频序列,实际固定在真实的视觉音频环境中记录。因此,在相同的视觉音频环境下回忆真实固定,它既不有效也不是必要的。为了解决这个问题,本文以弱策略的方式促进一种新的方法,以减轻对视觉音频模型培训的大规模培训集的需求。仅使用视频类别标签,我们提出了选择性类激活映射(SCAM)及其升级(诈骗+)。在空间 - 时间 - 音频环境中,前者遵循粗致细的策略来选择最辨别的区域,并且这些区域通常能够与真正的人眼固定表现出高一致性。后者用额外的多粒度感知机制配备了骗局,使整个过程更加符合真正的人类视觉系统。此外,我们从这些区域蒸馏出知识,以获得完整的新空间 - 音频(STA)固定预测(FP)网络,在视频标签不可用的情况下实现广泛的应用。不借助任何真正的人眼固定,这些STA FP网络的性能与完全监督网络的性能相当。代码和结果在https://github.com/guotaowang/stanet上公开使用。
translated by 谷歌翻译
Deep learning models can achieve high accuracy when trained on large amounts of labeled data. However, real-world scenarios often involve several challenges: Training data may become available in installments, may originate from multiple different domains, and may not contain labels for training. Certain settings, for instance medical applications, often involve further restrictions that prohibit retention of previously seen data due to privacy regulations. In this work, to address such challenges, we study unsupervised segmentation in continual learning scenarios that involve domain shift. To that end, we introduce GarDA (Generative Appearance Replay for continual Domain Adaptation), a generative-replay based approach that can adapt a segmentation model sequentially to new domains with unlabeled data. In contrast to single-step unsupervised domain adaptation (UDA), continual adaptation to a sequence of domains enables leveraging and consolidation of information from multiple domains. Unlike previous approaches in incremental UDA, our method does not require access to previously seen data, making it applicable in many practical scenarios. We evaluate GarDA on two datasets with different organs and modalities, where it substantially outperforms existing techniques.
translated by 谷歌翻译
The development of social media user stance detection and bot detection methods rely heavily on large-scale and high-quality benchmarks. However, in addition to low annotation quality, existing benchmarks generally have incomplete user relationships, suppressing graph-based account detection research. To address these issues, we propose a Multi-Relational Graph-Based Twitter Account Detection Benchmark (MGTAB), the first standardized graph-based benchmark for account detection. To our knowledge, MGTAB was built based on the largest original data in the field, with over 1.55 million users and 130 million tweets. MGTAB contains 10,199 expert-annotated users and 7 types of relationships, ensuring high-quality annotation and diversified relations. In MGTAB, we extracted the 20 user property features with the greatest information gain and user tweet features as the user features. In addition, we performed a thorough evaluation of MGTAB and other public datasets. Our experiments found that graph-based approaches are generally more effective than feature-based approaches and perform better when introducing multiple relations. By analyzing experiment results, we identify effective approaches for account detection and provide potential future research directions in this field. Our benchmark and standardized evaluation procedures are freely available at: https://github.com/GraphDetec/MGTAB.
translated by 谷歌翻译
As one of the prevalent methods to achieve automation systems, Imitation Learning (IL) presents a promising performance in a wide range of domains. However, despite the considerable improvement in policy performance, the corresponding research on the explainability of IL models is still limited. Inspired by the recent approaches in explainable artificial intelligence methods, we proposed a model-agnostic explaining framework for IL models called R2RISE. R2RISE aims to explain the overall policy performance with respect to the frames in demonstrations. It iteratively retrains the black-box IL model from the randomized masked demonstrations and uses the conventional evaluation outcome environment returns as the coefficient to build an importance map. We also conducted experiments to investigate three major questions concerning frames' importance equality, the effectiveness of the importance map, and connections between importance maps from different IL models. The result shows that R2RISE successfully distinguishes important frames from the demonstrations.
translated by 谷歌翻译
Compressed videos often exhibit visually annoying artifacts, known as Perceivable Encoding Artifacts (PEAs), which dramatically degrade video visual quality. Subjective and objective measures capable of identifying and quantifying various types of PEAs are critical in improving visual quality. In this paper, we investigate the influence of four spatial PEAs (i.e. blurring, blocking, bleeding, and ringing) and two temporal PEAs (i.e. flickering and floating) on video quality. For spatial artifacts, we propose a visual saliency model with a low computational cost and higher consistency with human visual perception. In terms of temporal artifacts, self-attention based TimeSFormer is improved to detect temporal artifacts. Based on the six types of PEAs, a quality metric called Saliency-Aware Spatio-Temporal Artifacts Measurement (SSTAM) is proposed. Experimental results demonstrate that the proposed method outperforms state-of-the-art metrics. We believe that SSTAM will be beneficial for optimizing video coding techniques.
translated by 谷歌翻译
We propose a distributionally robust return-risk model for Markov decision processes (MDPs) under risk and reward ambiguity. The proposed model optimizes the weighted average of mean and percentile performances, and it covers the distributionally robust MDPs and the distributionally robust chance-constrained MDPs (both under reward ambiguity) as special cases. By considering that the unknown reward distribution lies in a Wasserstein ambiguity set, we derive the tractable reformulation for our model. In particular, we show that that the return-risk model can also account for risk from uncertain transition kernel when one only seeks deterministic policies, and that a distributionally robust MDP under the percentile criterion can be reformulated as its nominal counterpart at an adjusted risk level. A scalable first-order algorithm is designed to solve large-scale problems, and we demonstrate the advantages of our proposed model and algorithm through numerical experiments.
translated by 谷歌翻译